智能论文笔记

Autonomous Precision Drone Landing with Fiducial Markers and a Gimbal-Mounted Camera for Active Tracking

Joshua Springer , Marcel Kyas

分类：机器人

2022-06-09

精确着陆是自动无人机飞行中的剩余挑战，没有广泛的解决方案。基金标记为无人机定位降落垫并自主执行精确着陆提供了一种计算廉价的方式。但是，该领域的大多数工作都取决于固定的向下置于朝下的摄像机，这限制了无人机检测标记的能力。我们提出了一种自动着陆方法，该方法使用带有饰品的摄像机来快速搜索着陆垫，通过简单地旋转到位，同时将相机上下倾斜，并在进近和着陆期间将相机不断地瞄准降落垫。该方法证明了5个未经干预的物理无人机上的5个测试基金系统中的4个，证明了成功的搜索，跟踪和着陆。每个基准制度，我们介绍了成功和不成功的着陆点的数量，以及每次成功着陆后从无人机到降落垫中心的距离的分布，并在系统之间进行了统计比较。我们还展示了飞行轨迹，标记跟踪性能以及登陆过程中每个通道的控制输出的代表性示例。最后，我们讨论了每个系统性能的基础的定性优势和缺点。

translated by 谷歌翻译

Evaluation of April Tag and WhyCode Fiducial Systems for Autonomous Precision Drone Landing with a Gimbal-Mounted Camera

Joshua Springer , Marcel Kyas

分类：计算机视觉

2022-03-18

基金标记为无人机提供了一种计算廉价的方式，以确定其在降落垫方面的位置并执行精确着陆。但是，该领域中的大多数现有工作都使用固定的，向下的朝向摄像头，该相机无法利用许多无人机上发现的常见的kimbal安装相机设置。这种刚性系统无法轻易跟踪检测到的标记，并且可能在非理想条件下（例如风阵）忽略了标记。本文评估了April Tag和WhyCode Futchial Systems，用于使用带有饰品的单眼相机登陆无人机着陆，具有无人机系统可以随着时间的推移跟踪标记的优势。但是，由于相机的方向发生了变化，因此我们必须知道标记的方向，这在单眼基准系统中是不可靠的。此外，系统必须快速。我们提出了2种方法来缓解怀尔科德的方向歧义，以及提高四月标签的运行时检测率的1种方法。我们根据标记方向歧义和检测率对2个默认系统评估了3个系统。我们在RASPBERRY PI 4上的ROS框架中测试了标记检测速率，并在其性能方面对系统进行排名。我们的第一个WHYCODE变体可显着降低方向歧义，而检测率无关紧要。我们的第二个WHYCODE变体与默认的WhyCode系统没有明显不同的方向歧义，但确实提供了多标记WhyCode Bundle Bunddle Bunsebless的其他功能。我们的4月标签变体不会显示Raspberry Pi 4的性能改进。

translated by 谷歌翻译

A Physics-Informed Neural Network to Model Port Channels

Marlon S. Mathias , Marcel R. de Barros , Jefferson F. Coelho , Lucas P. de Freitas , Felipe M. Moreno , Caio F. D. Netto , Fabio G. Cozman , Anna H. R. Costa , Eduardo A. Tannuri , Edson S. Gomi

分类：机器学习

2022-12-20

We describe a Physics-Informed Neural Network (PINN) that simulates the flow induced by the astronomical tide in a synthetic port channel, with dimensions based on the Santos - S\~ao Vicente - Bertioga Estuarine System. PINN models aim to combine the knowledge of physical systems and data-driven machine learning models. This is done by training a neural network to minimize the residuals of the governing equations in sample points. In this work, our flow is governed by the Navier-Stokes equations with some approximations. There are two main novelties in this paper. First, we design our model to assume that the flow is periodic in time, which is not feasible in conventional simulation methods. Second, we evaluate the benefit of resampling the function evaluation points during training, which has a near zero computational cost and has been verified to improve the final model, especially for small batch sizes. Finally, we discuss some limitations of the approximations used in the Navier-Stokes equations regarding the modeling of turbulence and how it interacts with PINNs.

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Experiments on Generalizability of BERTopic on Multi-Domain Short Text

Muriël de Groot , Mohammad Aliannejadi , Marcel R. Haas

分类：自然语言处理

2022-12-16

Topic modeling is widely used for analytically evaluating large collections of textual data. One of the most popular topic techniques is Latent Dirichlet Allocation (LDA), which is flexible and adaptive, but not optimal for e.g. short texts from various domains. We explore how the state-of-the-art BERTopic algorithm performs on short multi-domain text and find that it generalizes better than LDA in terms of topic coherence and diversity. We further analyze the performance of the HDBSCAN clustering algorithm utilized by BERTopic and find that it classifies a majority of the documents as outliers. This crucial, yet overseen problem excludes too many documents from further analysis. When we replace HDBSCAN with k-Means, we achieve similar performance, but without outliers.

translated by 谷歌翻译

Self-Supervised PPG Representation Learning Shows High Inter-Subject Variability

Ramin Ghorbani , Marcel T. J. Reinders , David M. J. Tax

分类：人工智能 | 机器学习

2022-12-07

With the progress of sensor technology in wearables, the collection and analysis of PPG signals are gaining more interest. Using Machine Learning, the cardiac rhythm corresponding to PPG signals can be used to predict different tasks such as activity recognition, sleep stage detection, or more general health status. However, supervised learning is often limited by the amount of available labeled data, which is typically expensive to obtain. To address this problem, we propose a Self-Supervised Learning (SSL) method with a pretext task of signal reconstruction to learn an informative generalized PPG representation. The performance of the proposed SSL framework is compared with two fully supervised baselines. The results show that in a very limited label data setting (10 samples per class or less), using SSL is beneficial, and a simple classifier trained on SSL-learned representations outperforms fully supervised deep neural networks. However, the results reveal that the SSL-learned representations are too focused on encoding the subjects. Unfortunately, there is high inter-subject variability in the SSL-learned representations, which makes working with this data more challenging when labeled data is scarce. The high inter-subject variability suggests that there is still room for improvements in learning representations. In general, the results suggest that SSL may pave the way for the broader use of machine learning models on PPG data in label-scarce regimes.

translated by 谷歌翻译

Land Use Prediction using Electro-Optical to SAR Few-Shot Transfer Learning

Marcel Hussing , Karen Li , Eric Eaton

分类：计算机视觉 | 人工智能 | 机器学习

2022-12-04

Satellite image analysis has important implications for land use, urbanization, and ecosystem monitoring. Deep learning methods can facilitate the analysis of different satellite modalities, such as electro-optical (EO) and synthetic aperture radar (SAR) imagery, by supporting knowledge transfer between the modalities to compensate for individual shortcomings. Recent progress has shown how distributional alignment of neural network embeddings can produce powerful transfer learning models by employing a sliced Wasserstein distance (SWD) loss. We analyze how this method can be applied to Sentinel-1 and -2 satellite imagery and develop several extensions toward making it effective in practice. In an application to few-shot Local Climate Zone (LCZ) prediction, we show that these networks outperform multiple common baselines on datasets with a large number of classes. Further, we provide evidence that instance normalization can significantly stabilize the training process and that explicitly shaping the embedding space using supervised contrastive learning can lead to improved performance.

translated by 谷歌翻译

Finding Front-Door Adjustment Sets in Linear Time

Marcel Wienöbst , Benito van der Zander , Maciej Liśkiewicz

分类：人工智能 | 机器学习

2022-11-29

Front-door adjustment is a classic technique to estimate causal effects from a specified directed acyclic graph (DAG) and observed data. The advantage of this approach is that it uses observed mediators to identify causal effects, which is possible even in the presence of unobserved confounding. While the statistical properties of the front-door estimation are quite well understood, its algorithmic aspects remained unexplored for a long time. Recently, Jeong, Tian, and Barenboim [NeurIPS 2022] have presented the first polynomial-time algorithm for finding sets satisfying the front-door criterion in a given DAG, with an $O(n^3(n+m))$ run time, where $n$ denotes the number of variables and $m$ the number of edges of the graph. In our work, we give the first linear-time, i.e. $O(n+m)$, algorithm for this task, which thus reaches the asymptotically optimal time complexity, as the size of the input is $\Omega(n+m)$. We also provide an algorithm to enumerate all front-door adjustment sets in a given DAG with delay $O(n(n + m))$. These results improve the algorithms by Jeong et al. [2022] for the two tasks by a factor of $n^3$, respectively.

translated by 谷歌翻译

Metal-conscious Embedding for CBCT Projection Inpainting

Fuxin Fan , Yangkong Wang , Ludwig Ritschl , Ramyar Biniazan , Marcel Beister , Björn Kreher , Yixing Huang , Steffen Kappler , Andreas Maier

分类：计算机视觉

2022-11-29

The existence of metallic implants in projection images for cone-beam computed tomography (CBCT) introduces undesired artifacts which degrade the quality of reconstructed images. In order to reduce metal artifacts, projection inpainting is an essential step in many metal artifact reduction algorithms. In this work, a hybrid network combining the shift window (Swin) vision transformer (ViT) and a convolutional neural network is proposed as a baseline network for the inpainting task. To incorporate metal information for the Swin ViT-based encoder, metal-conscious self-embedding and neighborhood-embedding methods are investigated. Both methods have improved the performance of the baseline network. Furthermore, by choosing appropriate window size, the model with neighborhood-embedding could achieve the lowest mean absolute error of 0.079 in metal regions and the highest peak signal-to-noise ratio of 42.346 in CBCT projections. At the end, the efficiency of metal-conscious embedding on both simulated and real cadaver CBCT data has been demonstrated, where the inpainting capability of the baseline network has been enhanced.

translated by 谷歌翻译

Efficient Deep Reinforcement Learning with Predictive Processing Proximal Policy Optimization

Burcu Küçükoğlu , Walraaf Borkent , Bodo Rueckauer , Nasir Ahmad , Umut Güçlü , Marcel van Gerven

分类：机器学习 | 人工智能

2022-11-11

Advances in reinforcement learning (RL) often rely on massive compute resources and remain notoriously sample inefficient. In contrast, the human brain is able to efficiently learn effective control strategies using limited resources. This raises the question whether insights from neuroscience can be used to improve current RL methods. Predictive processing is a popular theoretical framework which maintains that the human brain is actively seeking to minimize surprise. We show that recurrent neural networks which predict their own sensory states can be leveraged to minimise surprise, yielding substantial gains in cumulative reward. Specifically, we present the Predictive Processing Proximal Policy Optimization (P4O) agent; an actor-critic reinforcement learning agent that applies predictive processing to a recurrent variant of the PPO algorithm by integrating a world model in its hidden state. P4O significantly outperforms a baseline recurrent variant of the PPO algorithm on multiple Atari games using a single GPU. It also outperforms other state-of-the-art agents given the same wall-clock time and exceeds human gamer performance on multiple games including Seaquest, which is a particularly challenging environment in the Atari domain. Altogether, our work underscores how insights from the field of neuroscience may support the development of more capable and efficient artificial agents.

translated by 谷歌翻译